BEFORE WE START¶

(Based on the Tutorial's README.MD)

  • Miniforge should be installed

      $ conda init powershell  # only for Windows users - requires terminal restart
    
      $ conda activate 
    
      $ mamba create -n waw_ml python=3.10  # mamba/conda depending on what you use
    
      $ conda activate waw_ml
    
      $ pip install -r requirements.txt`

NOTE: you might need to restart you VS code

  • Choose the kernel waw_ml
In [5]:
# Please  run this cell ONCE and restart your Kernel.
!pip install -qU langchain_mistralai ipywidgets
In [6]:
from helper.custom_lllm import CustomLLM

from langchain_mistralai import ChatMistralAI
from langchain_core.output_parsers import StrOutputParser

from dotenv import load_dotenv
import os

# Load environment variables from the .env file
load_dotenv()

# Access the API key using os.getenv
api_key = os.getenv("MISTRAL_API_KEY")
api_key # You can have your own key on Mistral: https://console.mistral.ai/api-keys/
Out[6]:
'A9xKQZpgbYi3gMR2uVXUUKu5PBYH5pAK'

Loading the Model¶

(Locally or via API)¶

Make sure to run Mistral7B locally with LMSTUDIO

  • On the left column menu, select Developer (The green icon)
  • Select the model under loaded models: llm mistral-7b-instruct-v0.2.Q5_K_M.gguf
  • On the left side, click the button Start Server (Do not change anything in the settings below it)

You can see in the Server Logs that the model is accessible via http://localhost:1234/v1/

Loading models¶

In [13]:
model_local = CustomLLM() # Load Mistral7b from LM Studio local server 

model_api = ChatMistralAI(model="open-mistral-7b") # Load Mistral7b or Mixtral8x7B throught the API
# Note - The latest `open-mistral-nemo`: https://mistral.ai/news/mistral-nemo/
# All models: https://docs.mistral.ai/getting-started/models/

# CHOOSE THE MODEL YOU WANT TO USE

Test Run¶

In [6]:
# Create a prompt
prompt = '''
    What is DLR?
    '''

# Chain: contains the mode and the output
chain =   model_local | StrOutputParser()

response = chain.invoke(prompt)
print("RESPONSE from Local Mistral (With Quantization)")
print("-"*len("RESPONSE from Local Mistral (With Quantization)"))
print(response)
RESPONSE from Local Mistral (With Quantization)
-----------------------------------------------
 DLR stands for Deutsches Zentrum für Luft- und Raumfahrt, which translates to the German Aerospace Center. It is a research center for aeronautics and spaceflight based in Germany. The organization's mission includes research and development activities in various areas of aerospace technology, including aircraft design, engine development, space exploration, and Earth observation. DLR operates its own research facilities, conducts research in collaboration with universities and industry partners, and also participates in international space programs such as those managed by NASA and the European Space Agency (ESA).

drawing TRYITYourself_1

In [7]:
#  <TRYITYourself_1>: RERUN with Mistral API
chain_api =   model_api | StrOutputParser()  # REDEFINE CHAIN  # SOL  model_api | StrOutputParser()
response = chain_api.invoke(prompt) # Invoke
print("RESPONSE using Mistral through API")
print("-"*len("RESPONSE using Mistral through API"))

print(response)
RESPONSE using Mistral through API
----------------------------------
DLR stands for Deutsches Zentrum für Luft- und Raumfahrt e.V., which translates to German Aerospace Center in English. It is a research center for aerospace, energy, transportation, and digitalization. DLR is a federal research center for aerospace, energy, transportation, and digitalization. Its mission is to conduct research and development work in aeronautics, space, energy, transport, digitalization, and security, and to promote the application of its results for the benefit of society. It is headquartered in Berlin, Germany.

Section I. Introduction¶

Generative AI¶

A machine that is capable of creating content that mimics or approximates human ability.

The machine learning models that underpin generative AI have learned these abilities by finding statistical patterns in massive datasets of content that was originally generated by humans.

Large language models have been trained on trillions of words over many weeks and months, and with large amounts of compute power.

Generative AI from deeplearning.ai

Showcasing the capabilities of Generative AI¶

Examples - Explanation¶

drawing

Examples - Creative Writing¶

drawing

Examples - Idea Generation¶

drawing

More Examples¶

Summarize the following text with the focus on {aspect}

Translate the following text to informal German: {text}

Create a python function to {function requirements}

drawing TRYITYourself_2

In [8]:
#  <TRYITYourself_2>: Prompt based on suggestions
prompt = '''
Create a Python function process_data that fills missing values, normalizes numerical columns, 
and returns the processed DataFrame and basic statistics.
'''
response = chain.invoke(prompt)
print(response)
 To create a Python function `process_data` that fills missing values, normalizes numerical columns, and returns the processed DataFrame along with basic statistics, you can make use of the NumPy and pandas libraries. Here's an example implementation:

```python
import numpy as np
import pandas as pd

def process_data(dataframe):
    # Fill missing values using median for numerical columns and mode for categorical ones
    dataframe = dataframe.fillna({np.numerictype(col): dataframe[col].median()
                                for col in dataframe.select_dtypes(include='float64, int64').columns
                                if not dataframe[col].isnull().sum().sum() == 0} |
                               {np.object_dtype: dataframe[dataframe.apply(lambda x: x.isna().all())].mode().iloc[0]})

    # Normalize numerical columns using z-score normalization
    dataframe_num = dataframe.select_dtypes(include='float64, int64')
    dataframe_num_norm = (dataframe_num - dataframe_num.mean()) / dataframe_num.std()
    dataframe = pd.concat([dataframe.drop(columns=dataframe_num.columns), dataframe_num_norm], axis=1)
    dataframe.columns = dataframe.columns.astype(str).str.replace('^', '_')  # Rename columns with leading numbers to have an underscore prefix

    # Basic statistics computation
    mean = dataframe.mean()
    std = dataframe.std()
    min_value = dataframe.min()
    max_value = dataframe.max()
    
    return dataframe, {'Mean': mean, 'Standard Deviation': std, 'Min': min_value, 'Max': max_value}
```

This implementation covers the following steps:

1. Fills missing values using median for numerical columns and mode for categorical columns
2. Normalizes numerical columns using z-score normalization
3. Concatenates the original DataFrame with the normalized DataFrame to create a new one
4. Renames columns that have leading numbers to have an underscore prefix
5. Computes basic statistics (mean, standard deviation, min, and max) for all columns in the DataFrame
6. Returns the processed DataFrame along with the statistics as a dictionary.

The Classic Supervised Machine Learning Paradigm¶

Supervised Learning Paradigm

  • Training the model from scratch

Source: Schuller's Lecture

The Foundation Model Paradigm¶

IB llm

  • Base model that are trained to repeatedly predict the next word
  • Most FM uses Transformers: uses self-attention mechanism, allowing them to capture long-range dependencies and contextual relationships: Attention is all you Need

Ask a question to a foundation model:¶

  • prompt: What is the capital of Germany?
  • Answer

What is the capital of France

The Foundation Model Paradigm¶

IB llm

  • Chat models are further trained to follow instructions
  • You can finetune the Foundation model for any TASK, which requires few steps.

Common Terminologies¶

terminology

Section II.¶

Navigating Model Choices:¶

Size, Temperature, Accessibility, and Openness¶

2.1 Size: Small vs. Large Models¶

The size of the model is a critical factor in determining the performance of the model.

Small Large
Model Size Typically < 20B (e.g., Mistral-7b, Llama-7b) > 30 B
Training Data Small, focused dataset Massive, diverse datasets
Training Time weeks months
Performance simple tasks complex tasks (creative, open ended)
Inference Faster inference slower inference
Latency Very fast Can be slow
Generalization less capacity to generalize to new task/unseen data Strong generalization

A model size is usually defined by the number of Parameter learned during training. For example, Llama7B and Mistral-7B has 7 billion parameters, GPT-3 has 175 billion parameters, ...

Let's show the difference between large and small models...¶

*We need to define a prompt that intrigues creativity, reasonability and **complexity***

In [14]:
## ADD CODE HERE Mistral Large Vs Mistral 

prompt = '''
You are a research assistant helping to summarize and expand \
upon the following abstract from a scientific paper. \

First, summarize the key points of the abstract in a clear and concise \
manner suitable for a general audience. Then, generate three original \
and creative research questions that could be pursued in a follow-up study. \

Finally, write a brief paragraph discussing potential real-world applications of this research."

Abstract:
"In recent years, advancements in machine learning algorithms have led to significant breakthroughs in natural language processing (NLP). However, challenges remain in enabling models to understand nuanced human communication and context, particularly in specialized fields like legal and medical domains. This paper proposes a novel transformer-based architecture that integrates domain-specific knowledge graphs to enhance contextual understanding in these fields. Experimental results demonstrate improved accuracy and reduced bias in complex legal text interpretation and medical diagnosis generation."

'''

# 123 Billion
model_api_big = ChatMistralAI(model="mistral-large-latest", temperature=0.0) # Load Mistral7b or Mixtral8x7B throught the API
chain_api_big = model_api_big | StrOutputParser()
response_big = chain_api_big.invoke(prompt) 

# 7B
response_small = chain_api.invoke(prompt) 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 24
     21 response_big = chain_api_big.invoke(prompt) 
     23 # 7B
---> 24 response_small = chain_api.invoke(prompt) 

NameError: name 'chain_api' is not defined
In [16]:
display(Markdown(response_small))

Summary: This research focuses on improving machine learning models in understanding complex human communication, specifically in specialized fields like law and medicine. The team developed a new transformer-based architecture that incorporates domain-specific knowledge graphs. These graphs provide additional context, helping the model to make more accurate and less biased interpretations in legal text and medical diagnosis generation.

Research Questions:

  1. How can we further integrate and update domain-specific knowledge graphs in real-time to ensure the model's understanding remains current in rapidly evolving fields like medicine and law?
  2. Can this transformer-based architecture be adapted to other specialized domains, such as finance or environmental science, and what adjustments would be necessary?
  3. How can we ensure the model's interpretations are not only accurate and unbiased but also easily understandable for non-experts in these fields?

Potential Real-World Applications: This research has significant implications for various industries. In healthcare, it could lead to more accurate and less biased medical diagnoses, potentially improving patient care and reducing misdiagnoses. In the legal field, it could help in interpreting complex legal documents, aiding in legal research and decision-making. Additionally, it could be beneficial in other specialized domains, such as finance, where understanding nuanced information is crucial. By improving machine learning models' ability to understand complex human communication, we can unlock new possibilities in data analysis and decision-making across various industries.

In [19]:
display(Markdown(response_big))

Summary¶

Recent advancements in machine learning have greatly improved computers' ability to understand human language. However, there are still challenges in teaching machines to grasp the nuances and context of specialized fields like law and medicine. This paper introduces a new method that combines advanced machine learning techniques with specialized knowledge graphs to better understand legal texts and medical diagnoses. The results show that this approach improves accuracy and reduces bias in these complex areas.

Research Questions for Follow-Up Study¶

  1. How does the integration of domain-specific knowledge graphs affect the interpretability of machine learning models in legal and medical domains?

    • This question aims to explore whether the use of knowledge graphs makes it easier for humans to understand how the model arrives at its conclusions, which is crucial for trust and adoption in high-stakes fields.
  2. Can the proposed transformer-based architecture be effectively adapted to other specialized domains, such as finance or engineering, and what modifications would be necessary?

    • This question investigates the scalability and adaptability of the method to other fields, potentially broadening its impact and utility.
  3. What are the ethical implications of using machine learning models enhanced with domain-specific knowledge graphs in legal and medical decision-making, and how can these be addressed?

    • This question delves into the ethical considerations of deploying such models, including issues of fairness, accountability, and potential biases, and seeks ways to mitigate these concerns.

Potential Real-World Applications¶

The research presented in this paper has significant potential for real-world applications. In the legal domain, the enhanced contextual understanding provided by the proposed architecture could lead to more accurate and efficient legal document analysis, aiding lawyers and judges in making informed decisions. In the medical field, the improved accuracy and reduced bias in diagnosis generation could assist healthcare professionals in providing better patient care and reducing diagnostic errors. Additionally, the method could be extended to other specialized fields, such as finance or engineering, where accurate and context-aware language processing is crucial for decision-making and problem-solving.

The Winner?¶

According to ChatGPT 4o, it is the BIG MODEL

"Model 2 is superior based on these criteria because it provides a clearer, deeper, and more insightful analysis. It not only covers the technical aspects but also addresses the ethical and practical implications, demonstrating a sophisticated understanding of the research topic."

Rule of thumb on choosing a model:¶

  • Step 1. First check if your task is solvable with a Large Language Model (with Prompt Engineering)
  • Step 2. If yes, try to use a Smaller Language Model
  • Step 3. Repeat Step 2 until the results are slightly less accurate.
  • Step 4. Apply Prompt Engineering

Source: (Simms, 2024)

2.2 Temperature¶

The temperature is a parameter that controls the randomness of the LLM's output.

  • Range: From 0.0 to 1.0.
    • low temperature: more deterministic
    • high temperature: more random, creative. Usually the temperature is set by default to $0.7$

Changing the temperature settings¶

In [8]:
prompt = """
Imagine the characters from The IT Crowd, Harry Potter, \
and The Lord of the Rings are brainstorming one unique \
idea each to combat climate change using their skills. \
Pick a chracter from each, Summarize their suggestions \
in one sentence each and be brief.
"""
In [9]:
# Temperature 0 
model_api = ChatMistralAI(temperature=0.0) # Load Mistral7b from LM Studio local server 
chain = model_api | StrOutputParser()
response = chain.invoke(prompt)
print(response)
print("-"*100)
response = chain.invoke(prompt)
print(response)
Roy from The IT Crowd suggests using his expertise in technology to create a user-friendly app that encourages people to adopt eco-friendly habits and track their carbon footprint.

Hermione Granger from Harry Potter, with her love for books and knowledge, proposes a global campaign to share and promote climate change research, solutions, and spells (if she can find any) to protect the environment.

Gandalf from The Lord of the Rings, utilizing his wisdom and leadership, would rally leaders across Middle Earth (and the world) to form an alliance dedicated to reducing emissions and preserving the environment, just as they did to defeat Sauron.
----------------------------------------------------------------------------------------------------
Roy from The IT Crowd suggests using his expertise in technology to create a user-friendly app that encourages people to adopt eco-friendly habits and track their carbon footprint.

Hermione Granger from Harry Potter, with her love for books and knowledge, proposes a global campaign to share and promote climate change research, solutions, and spells (if she can find any) to protect the environment.

Gandalf from The Lord of the Rings, utilizing his wisdom and leadership, would rally leaders across Middle Earth (and the world) to form an alliance dedicated to reducing emissions and preserving the environment, just as they did to defeat Sauron.

drawing TRYITYourself_3 Run and set the temperature to 0.9

In [10]:
# TRYITYourself_3 
model_api = ChatMistralAI(temperature=0.9) # Load Mistral7b from LM Studio local server 
chain = model_api | StrOutputParser()
response = chain.invoke(prompt)
print(response)
print("-"*100)
response = chain.invoke(prompt)
print(response)
Roy from The IT Crowd suggests using his expertise in technology to create a user-friendly app that encourages people to adopt eco-friendly habits and track their carbon footprint.

Hermione Granger from Harry Potter would utilize her extensive knowledge of spells and potions to develop sustainable alternatives for everyday products, like self-sustaining lighting charms instead of electric bulbs.

Gandalf from The Lord of the Rings might propose harnessing the power of middle-earth's magical creatures, like Ents, to help with reforestation and preserving natural habitats.
----------------------------------------------------------------------------------------------------
Roy from The IT Crowd suggests using his expertise in technology to create a user-friendly app that encourages and rewards individuals for implementing eco-friendly habits.

Hermione Granger from Harry Potter, utilizing her knowledge in spell casting and love for books, proposes developing a sustainable magic-based solution to produce eco-friendly paper and ink for books, reducing deforestation.

Gandalf from The Lord of the Rings, with his wisdom and command over the elements, would advocate global unity in facing climate change, harnessing the power of nature and magic to restore balance to the environment.

2.3 LLM Accessibility¶

Accessibility

2.4 LLM Accessibility¶

Open VS. Proprietary Models

Section III. Prompt Engineering¶

Guide the model to improve its response for your task through:

  • specific instructions
  • By including some information related to your task

In-Context-Learning¶

  • Zero-shot prompting
  • Few-shot prompting
  • General information (background, clarifications, ...)

Techniques¶

  • Role-based or Persona
  • Chain-of-thought and co.

3.1 Zero-shot¶

  • Here is an example of zero-shot prompting.
  • In zero-shot prompting, you only provide the structure to the model, but without any examples of the completed task.
In [30]:
prompt = """
Text: "The government announced a new policy aimed at improving healthcare access across rural areas."
Category: 
"""

chain = model_local | StrOutputParser()

print(chain.invoke(prompt))
 Health and Wellness

Tag:
[policy, healthcare, rural areas]
In [31]:
# Let's add an explanation
prompt = """
Classify the following text into one of these categories: Politics, Technology, Sports, Entertainment.

"""+ prompt

chain = model_local | StrOutputParser()

print(chain.invoke(prompt))
 Politics or Healthcare (a subcategory of Politics)

Explanation: The text mentions the "government" and its "new policy," making it most likely to fall under the Politics category. However, since the policy in question pertains specifically to healthcare, it could also be classified as a subcategory of Politics related to Healthcare policies.

drawing TRYITYourself_4 REPEAT For another task: For example sentiment

In [32]:
# TRYITYourself_4
prompt = """
Text: "This Tutorial is Awesome"
Sentiment: 
"""

chain = model_local | StrOutputParser()

print(chain.invoke(prompt))
 The sentiment expressed in the text is positive.

3.2 Few Shot¶

  • Here is an example of few-shot prompting.
  • In few-shot prompting, you provide n examples and it is called n-shot.
In [48]:
prompt = """
Classify the following news article into one of these categories: Politics, Technology, Sports, Entertainment.

Text: "Mixtral released their new Open-source model."
Category: Technology

Text: "The Lakers secured a thrilling victory in the NBA playoffs last night"
Category: Sports

Text: "A new movie starring renowned actors is set to release this summer."
Category: Entertainment

Text: "The government announced a new policy aimed at improving healthcare access across rural areas."
Category:

"""

chain = model_local | StrOutputParser()

print(chain.invoke(prompt))
 Politics: The government announced a new policy. (However, the text does not provide any specific information about the policy's content or its implications for politics, so this classification is based on the assumption that a government policy announcement could potentially fall under the politics category.)

Specifying the Output Format¶

  • You can also specify the format in which you want the model to respond.

drawing TRYITYourself_5 ADD a sentence to command the LLM to return the category only

In [49]:
# TRYITYourself_5
prompt_output = prompt + "\n Give a one word response that can have one of the categories value. Do not provide any explanation, or any extra information."

chain = model_local | StrOutputParser()

print(chain.invoke(prompt_output))

# Let us rerun by explicitly explaining each section
 Politics: None of the given texts fit into this category.

We tried small models, let's try bigger model.¶

In [42]:
chain = model_local | StrOutputParser()

print(chain_api_big.invoke(prompt_output))
Politics

Let us try again with small model with the following enhancements¶

  • Add a section title clarifying the prompt structure (Exampples, Task)
  • Explicitly state the format
In [53]:
prompt = """
Classify the following news article into one of these categories: Politics, Technology, Sports, Entertainment.

# Examples
Text: "Mixtral released their new Open-source model."
Category: Technology

Text: "The Lakers secured a thrilling victory in the NBA playoffs last night"
Category: Sports

Text: "A new movie starring renowned actors is set to release this summer."
Category: Entertainment

# Task
Text: "The government announced a new policy aimed at improving healthcare access across rural areas."
Category:

"""

chain = model_local | StrOutputParser()
print("Added Section titles:\n", chain.invoke(prompt))

prompt_output = prompt + "\n Give a one word response that can have one of the categories value. Do not provide any explanation, or any extra information."
print("and output format...\n", chain.invoke(prompt_output))
Added Section titles:
  Politics or Healthcare (a subcategory of Politics often referred to as "Healthcare Policy")
and output format...
  Politics

3.3 Role Playing Prompting under Zero-shot¶

In this section, we use how to use role playing prompting under zero-shot setting.

For example, check paper Better Zero-Shot Reasoning with Role-Play Prompting (Kong, 2024)

In [57]:
prompt = """
Xavier was 4 feet tall and grew 3 inches. \
Cole was 50 inches tall and grew 2 inches over the summer. \
What is the difference between Cole and Xavier's height now?
"""

response = chain.invoke(prompt)
print(response)
 To find the difference in height between Cole and Xavier, we first need to determine how much each person has grown. We know that Xavier grew by 3 inches, and Cole grew by 2 inches.

Next, we add the amount of growth to each person's original height:

Xavier's new height = 4 feet * 12 inches per foot + 3 inches = 48 inches
Cole's new height = 50 inches + 2 inches = 52 inches

Finally, we can calculate the difference between their new heights:

Difference = Cole's new height - Xavier's new height = 52 inches - 48 inches = 4 inches.

So the difference between Cole and Xavier's height is now 4 inches.

drawing TRYITYourself_6 Prepend a role to your prompt: for example, "From now on..."

In [58]:
# TRYITYourself_6
role = "From now on, you are an excellent math teacher and always teach your students math problems correctly. And I am one of your students."
prompt_role = f"{role}\n{prompt}"

response = chain.invoke(prompt_role)
print(response)
 To find the difference in height between Cole and Xavier, we first need to convert both heights to the same unit. Since Xavier's initial height was given in feet and Cole's initial height was given in inches, let's convert Xavier's height to inches:

1 foot = 12 inches
Xavier's height (in inches) = 4 feet * 12 inches/foot = 48 inches

Now we can compare their final heights:

Cole's final height = 50 inches + 2 inches = 52 inches
Xavier's final height = 48 inches + 3 inches = 51 inches

To find the difference, subtract Xavier's height from Cole's height:

Difference in height = Cole's height - Xavier's height
Difference in height = 52 inches - 51 inches = 1 inch.

Therefore, the difference between Cole and Xavier's heights is now 1 inch.
In [60]:
prompt = """Write a review for the paper: Better Zero-Shot Reasoning with Role-Play Prompting from https://arxiv.org/pdf/2308.07702.
"""
print(chain.invoke(prompt))
 Title: "Revolutionizing Zero-Shot Reasoning through Role-Play Prompting: A Comprehensive Analysis"

The paper titled "Better Zero-Shot Reasoning with Role-Play Prompting" published on arXiv (<https://arxiv.org/pdf/2308.07702.pdf>) represents a significant contribution to the field of zero-shot reasoning, proposing an innovative approach using role-play prompting. The authors, Xiaoyu Wang et al., demonstrate their method's effectiveness in enhancing the performance of large language models in handling zero-shot tasks.

Zero-shot reasoning is a crucial aspect of artificial intelligence, enabling machines to understand and make decisions based on new concepts without prior training or data. In recent years, there has been an increasing interest in improving the capabilities of language models in tackling such tasks, with role-play prompting emerging as a promising approach.

The authors propose a novel method that involves generating role-play prompts to guide the model in understanding zero-shot concepts. The technique is based on the idea that by providing context and actions related to the new concept, the model can better understand its meaning and apply it to reasoning tasks. The proposed method is simple yet powerful, requiring minimal computational resources while yielding impressive results.

The authors evaluate their approach on several benchmark datasets for zero-shot reasoning, including MultiNLI, WinoGrande, and ANILC, demonstrating substantial improvements compared to various baselines and existing state-of-the-art methods. The experimental results highlight the effectiveness of role-play prompting in enabling better zero-shot reasoning abilities for language models.

One particularly interesting aspect of the paper is the authors' analysis of the role-play prompts generated by their method, revealing insights into the nature of these prompts and their impact on zero-shot reasoning performance. They also provide valuable discussions on potential applications and extensions of the proposed approach in various domains.

In summary, "Better Zero-Shot Reasoning with Role-Play Prompting" is a well-written and thoughtfully researched paper that makes an important contribution to the field of zero-shot reasoning. The authors' innovative approach using role-play prompts shows promise in enhancing the performance of large language models in handling zero-shot tasks, providing valuable insights into the nature of these prompts and their impact on model capabilities. This work opens up new avenues for future research in this area and is a must-read for anyone interested in improving AI's ability to reason about new concepts without prior training or data.

Overall, we highly recommend this paper to researchers, practitioners, and students in the fields of natural language processing, artificial intelligence, and machine learning, as it offers valuable insights and practical applications for enhancing zero-shot reasoning capabilities in large language models.
In [77]:
role = """
From now on, you are senior researcher in NlP.
"""

prompt_role= """{role}
Write a review for the paper: Better Zero-Shot Reasoning with Role-Play Prompting from https://arxiv.org/pdf/2308.07702.
"""
print(chain.invoke(prompt_role))
 Title: "Better Zero-Shot Reasoning with Role-Play Prompting: A Game-Changer in NLP?"

"Better Zero-Shot Reasoning with Role-Play Prompting" (BZSRRP) is a recent preprint from arXiv that proposes an innovative approach to zero-shot reasoning in Natural Language Processing (NLP). The authors, X. Sun, J. Li, and Y. Zhang, present role-play prompting as a promising solution to enhance the performance of models in understanding and generating responses based on new concepts or tasks without any prior training data.

The paper begins by introducing the challenge of zero-shot reasoning in NLP, explaining that current models struggle when confronted with novel concepts due to their reliance on large amounts of labeled data for effective learning. The authors then introduce role-play prompting as a potential remedy, where models are provided with specific instructions or roles to generate responses based on given contexts. This approach is inspired by the human ability to reason based on roles and contexts in various situations.

The authors present their experimental setup, which involves fine-tuning BERT and RoBERTa models using role-play prompting and evaluating their performance on multiple datasets such as MNLI, GLUE, and SQuAD. The results demonstrate significant improvements in zero-shot reasoning capabilities for both models. For instance, the authors report an impressive 12% absolute improvement in accuracy on the MNLI dataset when using role-play prompting.

One intriguing finding in the paper is that the benefits of role-play prompting are not limited to specific models or tasks but can be observed across various NLP benchmarks. This suggests a more universal applicability of role-play prompting and its potential as a powerful tool for improving zero-shot reasoning abilities in NLP models.

Another interesting aspect of the paper is the analysis of the learned roles by the model during the role-play process. The authors present an insightful discussion on how these roles can be visualized and interpreted, providing valuable insights into the model's thought processes and decision-making mechanisms.

In conclusion, "Better Zero-Shot Reasoning with Role-Play Prompting" is a well-written and thought-provoking paper that offers an innovative solution to the challenge of zero-shot reasoning in NLP. The authors present compelling evidence that role-play prompting can significantly enhance model performance, making it a promising approach for researchers and practitioners working on advanced NLP applications. Overall, this paper is a valuable contribution to the field and paves the way for further research on the topic.

3.4 Chain-of-Though Prompting under Zero-shot¶

In this section we cover CoT-Zero-Shot

Original CoT paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei, 2023)

In [75]:
prompt = """
The cafeteria had 43 apples. \
Someone ate 10, and 5 were thrown away. \
If they bought 7 and then used 18 to make lunch, how many apples do they have? 
"""

# reinit chain
chain = CustomLLM() | StrOutputParser()
response = chain.invoke(prompt)
print(response) # Answer should be 17
 To find out how many apples the person has left, we need to subtract the number of apples that were eaten, thrown away, and used to make lunch from the original amount:

Original number of apples = 43
Apples eaten = -10 (since this represents a decrease in the number of apples)
Apples thrown away = -5
Apples bought = 7
Apples used to make lunch = -18

Total change in the number of apples = (-10) + (-5) + 7 + (-18) = -13

Now, we add this change to the original number of apples to find out how many apples are left:

Remaining apples = Original number of apples + Total change in the number of apples
                             = 43 + (-13)
                             = 30

Therefore, the person has 30 apples left.

drawing TRYITYourself_7

Try to improve the prompt

In [76]:
# TRYITYourself_7
cot = "Think step by step and then answer."
prompt_cot = f"{prompt}\n{cot}"

response = chain.invoke(prompt_cot)
print(response)
 Let's go through this problem step by step:

1. The cafeteria had 43 apples initially.
2. Ten apples were eaten, so there are now 33 apples left (43 - 10).
3. Five apples were thrown away, so there are now 28 apples remaining (33 - 5).
4. Seven more apples were bought, making the total number of apples 35 (28 + 7).
5. Eighteen apples were used to make lunch, so there are now 17 apples left (35 - 18).

Therefore, they have 17 apples remaining.

LLM Reasoning¶

  • A good resource for reasoning Edge 353: A New Series About Reasoning in Foundation Models

    Reasoning is one of the core building blocks and marvels of human cognition. Conceptually, reasoning refers to the ability of models to work through a problem in a logical and systematic way to arrive to a conclusion. Obviously, reasoning assumes neither the steps nor the solutions are included as part of the training dataset.

    In the context of LLMs, reasoning is typically seen as a property that emerges after certain scale and is not applicable to small models. Some simpler forms of reasoning can be influenced via prompting and in-context learning while a new school have emerged around multi-step reasoning. In the latter area, we can find many variants of the chain-of-thought(CoT) method such as tree-of-thoughts or graph-of-thoughts.

Langchain¶

  • Open Source development framework for building LLM applications
  • There are two different packages: Python and Javascript
  • Key values
    • Modular components that can be used by themselves or in conjunction
    • Use Cases: Common ways to combine components

source: Langchain for LLM application development by the founders of Langchain

LangChain components¶

Models, Prompts and output Parsers¶

  • Models --> The Language Model
  • Prompt --> refers to the style (or template) of the input we pass to the model
  • Output Parser --> Taking the output of the model and parsing it into a structured format (Json, str)

For scientific research, usually you need to repeatedly reuse some of these models --> LangChain gives an easy set of abstractions to automate and structure these operations

  • Why use prompt templates?
    • Prompts can be Long and detailed
    • Reuse good prompt when you can!

Memory¶

In default, the models to not remember your previous prompt

  • llm are stateless: each transaction is independent
  • chatbots appear to have memory by providing the full conversation as context % show that it doesnt remember